tele9752wikiaorg-20200213-history
Advanced topic 2, 2012
MMS: An Autonomic Network-Layer Foundation for Network Management (Topic 2 from Advanced Topics 2012) I.INTRODUCTION The main functions of modern service provider networks are providing basic packet delivery services, securing computing resources, ensuring application performance, enhancing application reliability and enabling utility computing services. In the development of commercial networks, they relied on dial-up modems, orthogonal Ethernet network and also in-band connectivity to control routers. However, these methods are neither self-healing nor self-optimizing and insecure. Then a subsystem, Meta-Management System(MMS), appeared to solve these problems, which is itself autonomic, that provides robust and universal support for management plane communications. The design of MMS addresses the real-world constraints imposed by the network environment in which it must operate. II.TECHNIQUES FROM OTHER DOMAINS *All routing techniques are self-healing in that they respond to link or node failures and re-route, except static routing. However, not all routing techniques are self-configuring. Any mis-configuration of these parameters could render the network inoperable. *Ethernet1 is a self-configuring and self-healing system, but it does not self-protect and self-optimize. Any host on an Ethernet can launch a denial of service attack by flooding the entire network. *Mobile ad hoc networks are self-configuring and self-healing. However, these techniques may sacrifice routing efficiency in favor of node mobility and minimizing packet transmission energy consumption, which is not our primary concern. *Routing techniques fixed on forwarding data via the shortest paths (e.g. OSPF, IS-IS, RIP) give too much power to any compromised node, while a self-protecting technique will need to have more flexible control over routing. *One class of techniques uses asymmetric public key cryptography to authenticate messages. However, asymmetric cryptography is computationally very expensive. *To avoid asymmetric cryptography, some techniques simply use a single shared secret key among all nodes2. However, such techniques require a large amount of management configuration. *Hash chain techniques are most effective for broadcast traffic authentication but do not out-perform pair-wise shared secret techniques. To meet the needs of autonomic management plane communications, the solution should strike a balance between computation overhead, complexity and security by automatically establishing shared secret keys and by using efficient symmetric cryptography for packet handling. III. MMS Design and Implementation In general, the MMS module runs on both network elements (NE) e.g. routers, switches, etc. and management stations (MS). A. MMS Features Overview MMS offers several features in automated fashions – i.e. self-configuring (self-C), self-healing (self-H), self-optimizing (self-O) and self-protecting (self-P) as followings. *Automatic creation of management channels (Self-C) – the MMS automatically establishes end-to-end management communication channels between the MS and the NEs in a secured manner using security certificate. *Integrated security assurance (Self-P) – the MMS is designed to be robust to DoS (Denial of Service) attack at the MS and NE compromise. *Integrated liveness assurance (Self-H, Self-O, Self-P) – MMS always maintains the availability of the management channels. When network connectivity is lost, it dynamically re-route the traffic. *Handles large networks and a wide range of devices – CPU and memory requirements are placed on the MSs instead of NEs for the ease of upgrade and those requirements on NEs are small. So MMS can run on large networks. *Evolvable after deployment – Multiple instances of the MMS are allowed to run on the same network simultaneously. So the MMS can be upgraded or replaced with zero downtime. ''B. Partitioning Data Links for MMS Communication'' The MMS can operate based on either in-band (dedicated links) or out-of-band (the same links that carry user data) management fashions. For in-band, to prevent user traffic from interfering with MMS traffic, QoS (Quality of Service) can be applied so that MMS traffic has the highest priority and served first by the scheduler. ''C. Automatic Construction of Secure Channels'' This section explains how MMS establishes and maintains secure management channels '1. Threat Model : ' The MMS is designed to resist the following threats: *Operator error – improperly configuring network elements *Attack from an end-host – network connected hosts might attempt to perform DoS attack into the management channel. *Compromise of a network element – attackers might compromise any NE and use it to perform DoS attacks in the network. '2. Minimizing State Held by Network Elements : ' In MMS, it reduces configuration requirements as much as possible to eliminate configuration errors. The first step in building a secure channel is securely authenticating the endpoints of the channel by using a network certificate at each NE, a MS certificate at MSs and private/public key pair used to identify each NE. '' ''' '3. Secure Routing :' Traffic forwarding in MMS is controlled by onion-encrypted source routes in the headers of the MMS frames that list the series of NEs which the frames must pass through. A source route is built like an onion, with the list of hops remaining in the route encrypted in the secret key of the NE making the next forwarding operation. Onion-routing is used for two major reasons. First, it creates a secure log of the frame’s traversed path which only the MS can fully decrypt since it know the secret keys of all NEs. Second, MMS on each NE does not need to maintain a routing table that grows with the network size. The MMS uses''' recursive authentication procedure''' to authenticate the NEs, create and send them the encrypted source routes that will be used to communication with the MS. Step 1: MS initiates the authentication process with the directly connected NEs, send them authentication challenge. Step 2: After verifying certificates, NE proves its identity, verifies the MS is valid. Step 3: Each authenticated NE obtains an onion-encrypted source route that is used to communicate with the MS Step 4: The NEs send their LSAs (Link State Advertisement) to the MS informing if there is any new NEs Step 5: The MS recursively authenticate those NEs (B) by sending them challenges over authenticated NEs. So on…. By using recursive authentication, an MS can establish MMS secure channels to around 1,000 core devices within 30 seconds '4. Resilience to Failures : ' LSAs will be generated and sent to the MS in case of topology change and the MS will recalculate onion-encrypted source routes for affected NEs and send to them. 5. Resilience to attacks : Under the security framework, only authenticated NEs can communicate with MSs via the MMS. With the use of traffic isolation and scheduling techniques, DoS attacks cannot disrupt management communication. In the case that a NE is compromised, the attacker cannot modify the MMS frames due to onion-encrypted method. ''D. Assuring Liveness'' #'Protecting Against CPU Starvation:' A common issue on NEs is CPU starvation caused by a run-away process or a data-plane DoS attack. To enable recovery from this type of situation, the MMS provides a process management API and a packet filtering API. Using these APIs, a MS can command the MMS to return a list of the processes running on a network element, kill a particular process, change a process’ priority, install an IP data plane packet filter, or reboot the NE. These mechanisms allow an operator to remotely restore liveness and reconﬁgure the NE. # Protecting Against Memory Outages: The MMS is designed to avoid "out of memory” errors by using static rather than dynamic memory allocation3. In this way, as long as the MMS is successfully loaded at system startup time, it is unlikely to be impaired by memory allocation problems caused by misbehaving processes. ''E. Evolving the MMS after Deployment'' Our approach to robustly evolving the MMS is to allow multiple versions of the system to operate over the same network at the same time. This allows the new version to be brought up and thoroughly tested before the old version is removed. Each version of the MMS operates independently and in parallel. ''F. MMS Interfaces for Communication and Recovery'' The MMS provides two key APIs: one for remote recovery to address liveness issues, and another to support existing network management applications that use TCP/IP protocols for communication. '1.MMS API for Remote Recovery: ' *Through the process management API, a MS can command the MMS to return a list of the processes running on a network element, kill a particular process, change a process priority, start a process, or reboot the NE. *The MMS packet filtering API acts as a firewall and allows IP data plane packet filters to be installed directly via the MMS. When the packet filtering API is remotely invoked, a packet filter rule is sent from a MS to a network element. The MMS on the target NE directly communicates the rule to the packet filtering kernel module, without competing with any user space applications for resources. '2.MMS API for Communication:' When a MS is plugged into a network, the MMS presents a virtual management LAN that includes the MS and all authenticated NEs. Each node in the virtual management LAN is assigned a unique MMS management address. The length of the address is the same as IPv4 so that existing management applications can send messages to and receive messages from a management address. ''G. MMS Implementation'' The MMS traffic is captured by a trap in the network stack and by-passes layer-3 IP processing completely. On the MS, traffic sent by a management application is injected into the MMS; the traffic is forwarded by the MMS on intermediate NEs, and is delivered via the MMS to the application running on the receiver NE. *The RSA algorithm is used for asymmetric cryptographic operations like authentication and verification of messages between the MS and the NEs. *The AES implementation is used for symmetric encryption and decryption of the MMS frames during message forwarding and also by the MS while constructing the onion-encrypted source routes for the NEs. *The node ID is chosen to be 32 bits, same size as that of an IPv4 address. This enables external management applications to communicate via the MMS easily by using this management IP address. *The kernel module is the only software required to run the MMS. Since the MS and the NEs share common tasks, only one module that implements the functionalities required for both the MS and the NE is used. IV.PERFORMANCE EVALUATION ''A. Low Forwarding Overhead'' '1.Delay overhead measurement:' Nodes are connected with 1 Gbps Ethernet links to form a linear chain topology to measure the delay overhead and the ICMP packets are exchanged between the transmitter and receiver. In the figure below, it shows the comparison of the trip delays for ICMP packets carried by he MMS and by the regular IP data channel. It can be seen that the round trip delays increase linearly with hop count, and the latency added bu the MMS is less than 0.1 milliseconds per hop. '2.Throughput overhead measurement:' A three-node chain topology, with aa MS as the sender, a NE as the forwarder, and a second NE as the receiver, is used to measure the throughput overhead of MMS and the bandwidth of the links connecting the three nodes is varied. The throughput difference between the MMS and the regular IP data channel is shown in the figure below. It is noticeable that only after link bandwidth increases to 400 Mbps, and the best TCP throughput the MMS achieved is 800 Mbps. ''B. Resilient Routing'' NEs send LSAs to MSs and MSs re-compute and push out updated onion-encrypted source routes to NEs during network failures. But when multiple failures occur at the same time, some LSAs might fail to reach the MS. Then the scenario shown in the figure below is introduced to evaluate the MMS’s ability to maintain working communications in the presence of the network failures. In this figure, R1 simultaneously loses two links, and its initial LSAs to the MS are lost; MS detects failure of the link to R1 and it informs R1 to re-route through R2; LSA from R1 gets through allowing MS to re-compute and push a new route to R3. The table shows the time line of the steps taken during re-convergence. When there are more LSA retransmissions are used, the MMS will eventually re-converges even after multiple failures. ''C. Fast Secure-Channel Setup'' The measurement of MMS channel setup time is implemented with three different types of topologies, which are Abilene backbone topology, ISP backbone topology and a set of production enterprise network topologies. And the figure below indicates the predicted and measured channel setup time for each topology. And according to this figure, a new MS plugged into a network with a thousand NEs will take only about 30 seconds to build secure channels to all NEs. V.CASE STUDIES MMS mechanisms can be used in solving concrete problems that arise in network management. ''A. Self-Optimization in Wireless Mesh Access Networks'' Wireless links can be asymmetric and links can have unpredictable packet loss rate in wireless mesh networks. And the MMS can handle the lossy links. It uses link quality estimates to detect the links with high packet loss. ''B. Recovery from Control and Management Plane Overload'' A router’s control and management planes run a variety of applications. Software bugs, network operation errors and network attacks can cause applications to consume excessive computing resources and can even render a router unreachable or unable to respond to remote management commands. For example during the breakout of the Slammer worm , many routers and switches became unresponsive. As a result, routers’ CPUs and memories were overwhelmed, forcing operators to physically visit the affected devices to install packet filters to block the worm traffic. This dramatically increased the time required to get the network back under control. In situations where the control and management planes are threatened by resource starvation, the MMS can mitigate the threat through its packet filtering and process management APIs. VI.CONCLUSIONS The MMS includes special recovery APIs that can be extremely useful in practice and the latency and throughput performance will meet the requirements of many demanding management applications. But the limitation of the current design is that the MS is assumed to be secure, and the techniques to protect the MS need to be explored in the future. REFERENCES 1 LAN/MAN Standards Committee of the IEEE Computer Society, “IEEE Standard for Local and metropolitan area networks: Medi a Access Control (MAC) Bridges - 802.1D,” 2004. 2 S. Basagni, K. Herrin, D. Bruschi, and E. Rosti, “Secure pebblenets,”in Proc. the 2nd ACM international symposium on Mobile ad hoc networking & computing, pp. 156–163, ACM New York, NY, USA, 2001. 3 N. Cooprider, W. Archer, E. Eide, D. Gay, and J. Regehr, “Efficient Memory Safety for TinyOS,” in Proc. ACM International Conference on Embedded Networked Sensor Systems (Sensys), 2007.